Improving Word Alignment using Word Similarity
نویسندگان
چکیده
We show that semantic relationships can be used to improve word alignment, in addition to the lexical and syntactic features that are typically used. In this paper, we present a method based on a neural network to automatically derive word similarity from monolingual data. We present an extension to word alignment models that exploits word similarity. Our experiments, in both large-scale and resourcelimited settings, show improvements in word alignment tasks as well as translation tasks.
منابع مشابه
Improving Word Alignment by Exploiting Adapted Word Similarity
This paper presents a method to improve a word alignment model in a phrase-based Statistical Machine Translation system for a lowresourced language using a string similarity approach. Our method captures similar words that can be seen as semi-monolingual across languages, such as numbers, named entities, and adapted/loan words. We use several string similarity metrics to measure the monolingual...
متن کاملImproving Low-Resource Statistical Machine Translation with a Novel Semantic Word Clustering Algorithm
In this paper we present a non-languagespecific strategy that uses large amounts of monolingual data to improve statistical machine translation (SMT) when only a small parallel training corpus is available. This strategy uses word classes derived from monolingual text data to improve the word alignment quality, which generally deteriorates significantly because of insufficient training. We pres...
متن کاملSemantic Mapping Using Automatic Word Alignment and Semantic Role Labeling
To facilitate the application of semantics in statistical machine translation, we propose a broad-coverage predicate-argument structure mapping technique using automated resources. Our approach utilizes automatic syntactic and semantic parsers to generate Chinese-English predicate-argument structures. The system produced a many-to-many argument mapping for all PropBank argument types by computi...
متن کاملWord-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings
One of the most important problems in machine translation (MT) evaluation is to evaluate the similarity between translation hypotheses with different surface forms from the reference, especially at the segment level. We propose to use word embeddings to perform word alignment for segment-level MT evaluation. We performed experiments with three types of alignment methods using word embeddings. W...
متن کاملCognates and Word Alignment in Bitexts
We evaluate several orthographic word similarity measures in the context of bitext word alignment. We investigate the relationship between the length of the words and the length of their longest common subsequence. We present an alternative to the longest common subsequence ratio (LCSR), a widely-used orthographic word similarity measure. Experiments involving identification of cognates in bite...
متن کامل